Exploratory Analysis

We would like to use visualization to find out some patterns of felony frequency in NYC regarding time and date. We want to know how month, day of the week, and specific hour of the day are associated with the felony counts.

Average Daily Felony Frequency by Year and Month

First, we’ll explore trends of felony crimes from 2016 to September of 2022. We want to identify general trends over time and to see if there were any significant changes before and during the covid-19 pandemic.

num_days = function(month, year) {
  
  year = as.integer(year)
  months = 1:12
  names(months) = month.abb
  month = months[month]
  
  as.numeric(strftime(as.Date(paste(year + month %/% 12, month %% 12 + 1, "01", sep = "-")) - 1, "%d"))
  
}

complaint %>% 
  filter(level == "FELONY") %>% 
  mutate(year = fct_rev(year)) %>% 
  group_by(year, month) %>% 
  dplyr::summarize(mean_freq = n() / num_days(month, year)) %>% 
  plot_ly(
    x = ~month, y = ~year, z = ~mean_freq,
    type = "heatmap"
  ) %>% 
  colorbar(x = 1, y = 1) %>% 
  layout(
    title = "Average Daily Felony Frequency by Year and Month",
    xaxis = list(title = "Month"),
    yaxis = list(title = "Year")
  )

It seems that prior to 2020, felony frequency was slighter lower in colder months and slighter higher in warmer months, with no apparent annual fluctuations. Since 2020, the frequency of serious crimes has been more variable compared to before the pandemic. In April 2020, the frequency of serious crimes was the lowest point in the observed time range, which is probably due to the statewide stay-at-home order. Besides, since the beginning of 2022, felony frequency has significantly increased compared to previous years, reaching its highest point in June and July.

Temporal Heat Map for Felony Crimes

We would like to create a plot that shows the hourly frequency of felonies by hour of the day and day of the week.

complaint %>% 
  filter(level == "FELONY") %>% 
  drop_na(hour) %>% 
  mutate(day_of_week = fct_rev(day_of_week)) %>% 
  group_by(hour, day_of_week) %>% 
  dplyr::summarize(mean_freq = n() / 352) %>% 
  plot_ly(
    x = ~hour, y = ~day_of_week, z = ~mean_freq,
    type = "heatmap"
  ) %>% 
  colorbar(x = 1, y = 1) %>% 
  layout(
    title = "Average Hourly Felony Frequency by Time of the Week",
    xaxis = list(title = "Hour of the Day"),
    yaxis = list(title = "Day of the Week")
  )

From the heatmap above, we can observe the following characteristics of the hourly crime frequency over the course of a week:

  • During weekdays (Monday to Friday), the felony frequency is higher during the afternoon and early evening hours (3pm-7pm), and gradually decreases until the early morning hours (3am-6am) where it reaches its lowest point.
  • During weekends (Saturday and Sunday), the felony frequency is lower in the afternoon compared to weekdays, but does not show a significant decrease until midnight. The crime frequency in the late night and early morning hours (12am-5am) is significantly higher than on weekdays and reaches its lowest point at around 6-7am, later than on weekdays.
complaint %>% 
  filter(level == "FELONY") %>% 
  filter(offense == "ROBBERY") %>% 
  drop_na(hour) %>% 
  mutate(day_of_week = fct_rev(day_of_week)) %>% 
  group_by(hour, day_of_week) %>% 
  dplyr::summarize(mean_freq = n() / 352) %>% 
  plot_ly(
    x = ~hour, y = ~day_of_week, z = ~mean_freq,
    type = "heatmap"
  ) %>% 
  colorbar(x = 1, y = 1) %>% 
  layout(
    title = "Average Hourly Robbery Frequency by Time of the Week",
    xaxis = list(title = "Hour of the Day"),
    yaxis = list(title = "Day of the Week")
  )

The frequency characteristics of robbery are generally the same as that of felony overall: fewer robberies in the morning, more in the afternoon and evening. However, robbery frequency in the late night (0-4 am) on weekends is relatively high and reaches its highest point at 4 am on Sunday. Given that robberies typically occur in public places (such as streets), and that the number of people outside at late night on weekends is certainly much smaller than during the day, going out at this time is much more likely to make you a target of robbery.

Statistical Testing

Felony Frequency by Season

From the visualization, it seems that the difference in felony frequency between colder months and warmer months is not very obvious prior to 2020. We want to use one-way ANOVA to test if daily felony frequency means are equal across four seasons in pre-covid years (2016-2019).

\(H_0\): Daily felony frequency means does not vary between seasons.

\(H_1\): At least two seasons have different daily felony frequency means.

daily_by_season =
  complaint %>% 
  filter(level == "FELONY") %>% 
  filter(year %in% 2016:2019) %>% 
  mutate(
    season = case_when(
      month %in% c("Mar", "Apr", "May") ~ "Spring",
      month %in% c("Jun", "Jul", "Aug") ~ "Summer",
      month %in% c("Sep", "Oct", "Nov") ~ "Fall",
      month %in% c("Dec", "Jan", "Feb") ~ "Winter"
    ),
    season = as.factor(season)
  ) %>% 
  group_by(covid_state, year, month, day, season) %>% 
  dplyr::summarize(n_obs = n())

daily_by_season %>% 
  lm(n_obs ~ season, data = .) %>% 
  anova() %>% 
  knitr::kable(caption = "One Way ANOVA of Felony Frequency and Seasons")
One Way ANOVA of Felony Frequency and Seasons
Df Sum Sq Mean Sq F value Pr(>F)
season 3 513449.9 171149.96 85.66409 0
Residuals 1457 2910968.9 1997.92 NA NA

Since the p-value is less than 0.05, we reject the null hypothesis. We have sufficient evidence to conclude that at least two seasons have different daily felony frequency means in pre-covid years.

We want to conduct post-hoc analysis to determine which items are significantly different or similar. We will use Bonferroni adjustment to modify the critical regions, allowing us to control the probability of rejecting the null hypothesis when there are no real differences.

pairwise.t.test(daily_by_season$n_obs, daily_by_season$season, p.adj = 'bonferroni')
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  daily_by_season$n_obs and daily_by_season$season 
## 
##        Fall    Spring  Summer 
## Spring 6.8e-12 -       -      
## Summer 3.0e-05 < 2e-16 -      
## Winter < 2e-16 0.072   < 2e-16
## 
## P value adjustment method: bonferroni

We have sufficient evidence that the daily frequencies of felonies across the four seasons are different from each other, with the exception of the difference between spring and summer.